home *** CD-ROM | disk | FTP | other *** search
Text File | 1988-10-20 | 47.6 KB | 1,070 lines |
-
-
- File: internals, Node: Passes, Next: RTL, Prev: Interface, Up: Top
-
- Passes and Files of the Compiler
- ********************************
-
- The overall control structure of the compiler is in `toplev.c'. This file
- is responsible for initialization, decoding arguments, opening and closing
- files, and sequencing the passes.
-
- The parsing pass is invoked only once, to parse the entire input. The RTL
- intermediate code for a function is generated as the function is parsed, a
- statement at a time. Each statement is read in as a syntax tree and then
- converted to RTL; then the storage for the tree for the statement is
- reclaimed. Storage for types (and the expressions for their sizes),
- declarations, and a representation of the binding contours and how they
- nest, remains until the function is finished being compiled; these are all
- needed to output the debugging information.
-
- Each time the parsing pass reads a complete function definition or
- top-level declaration, it calls the function `rest_of_compilation' or
- `rest_of_decl_compilation' in `toplev.c', which are responsible for all
- further processing necessary, ending with output of the assembler language.
- All other compiler passes run, in sequence, within `rest_of_compilation'.
- When that function returns from compiling a function definition, the
- storage used for that function definition's compilation is entirely freed,
- unless it is an inline function (*Note Inline::.).
-
- Here is a list of all the passes of the compiler and their source files.
- Also included is a description of where debugging dumps can be requested
- with `-d' options.
-
- * Parsing. This pass reads the entire text of a function definition,
- constructing partial syntax trees. This and RTL generation are no
- longer truly separate passes (formerly they were), but it is easier to
- think of them as separate.
-
- The tree representation does not entirely follow C syntax, because it
- is intended to support other languages as well.
-
- C data type analysis is also done in this pass, and every tree node
- that represents an expression has a data type attached. Variables are
- represented as declaration nodes.
-
- Constant folding and associative-law simplifications are also done
- during this pass.
-
- The source files for parsing are `parse.y', `decl.c', `typecheck.c',
- `stor-layout.c', `fold-const.c', and `tree.c'. The last three are
- intended to be language-independent. There are also header files
- `parse.h', `c-tree.h', `tree.h' and `tree.def'. The last two define
- the format of the tree representation.
-
- * RTL generation. This is the conversion of syntax tree into RTL code.
- It is actually done statement-by-statement during parsing, but for
- most purposes it can be thought of as a separate pass.
-
- This is where the bulk of target-parameter-dependent code is found,
- since often it is necessary for strategies to apply only when certain
- standard kinds of instructions are available. The purpose of named
- instruction patterns is to provide this information to the RTL
- generation pass.
-
- Optimization is done in this pass for `if'-conditions that are
- comparisons, boolean operations or conditional expressions. Tail
- recursion is detected at this time also. Decisions are made about how
- best to arrange loops and how to output `switch' statements.
-
- The source files for RTL generation are `stmt.c', `expr.c',
- `explow.c', `expmed.c', `optabs.c' and `emit-rtl.c'. Also, the file
- `insn-emit.c', generated from the machine description by the program
- `genemit', is used in this pass. The header files `expr.h' is used
- for communication within this pass.
-
- The header files `insn-flags.h' and `insn-codes.h', generated from the
- machine description by the programs `genflags' and `gencodes', tell
- this pass which standard names are available for use and which
- patterns correspond to them.
-
- Aside from debugging information output, none of the following passes
- refers to the tree structure representation of the function (only part
- of which is saved).
-
- The decision of whether the function can and should be expanded inline
- in its subsequent callers is made at the end of rtl generation. The
- function must meet certain criteria, currently related to the size of
- the function and the types and number of parameters it has. Note that
- this function may contain loops, recursive calls to itself
- (tail-recursive functions can be inlined!), gotos, in short, all
- constructs supported by GNU CC.
-
- The option `-dr' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.rtl' to the input
- file name.
-
- * Jump optimization. This pass simplifies jumps to the following
- instruction, jumps across jumps, and jumps to jumps. It deletes
- unreferenced labels and unreachable code, except that unreachable code
- that contains a loop is not recognized as unreachable in this pass.
- (Such loops are deleted later in the basic block analysis.)
-
- Jump optimization is performed two or three times. The first time is
- immediately following RTL generation. The second time is after CSE,
- but only if CSE says repeated jump optimization is needed. The last
- time is right before the final pass. That time, cross-jumping and
- deletion of no-op move instructions are done together with the
- optimizations described above.
-
- The source file of this pass is `jump.c'.
-
- The option `-dj' causes a debugging dump of the RTL code after this
- pass is run for the first time. This dump file's name is made by
- appending `.jump' to the input file name.
-
- * Register scan. This pass finds the first and last use of each
- register, as a guide for common subexpression elimination. Its source
- is in `regclass.c'.
-
- * Common subexpression elimination. This pass also does constant
- propagation. Its source file is `cse.c'. If constant propagation
- causes conditional jumps to become unconditional or to become no-ops,
- jump optimization is run again when CSE is finished.
-
- The option `-ds' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.cse' to the input
- file name.
-
- * Loop optimization. This pass moves constant expressions out of loops.
- Its source file is `loop.c'.
-
- The option `-dL' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.loop' to the input
- file name.
-
- * Stupid register allocation is performed at this point in a
- nonoptimizing compilation. It does a little data flow analysis as
- well. When stupid register allocation is in use, the next pass
- executed is the reloading pass; the others in between are skipped.
- The source file is `stupid.c'.
-
- * Data flow analysis (`flow.c'). This pass divides the program into
- basic blocks (and in the process deletes unreachable loops); then it
- computes which pseudo-registers are live at each point in the program,
- and makes the first instruction that uses a value point at the
- instruction that computed the value.
-
- This pass also deletes computations whose results are never used, and
- combines memory references with add or subtract instructions to make
- autoincrement or autodecrement addressing.
-
- The option `-df' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.flow' to the input
- file name. If stupid register allocation is in use, this dump file
- reflects the full results of such allocation.
-
- * Instruction combination (`combine.c'). This pass attempts to combine
- groups of two or three instructions that are related by data flow into
- single instructions. It combines the RTL expressions for the
- instructions by substitution, simplifies the result using algebra, and
- then attempts to match the result against the machine description.
-
- The option `-dc' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.combine' to the
- input file name.
-
- * Register class preferencing. The RTL code is scanned to find out
- which register class is best for each pseudo register. The source
- file is `regclass.c'.
-
- * Local register allocation (`local-alloc.c'). This pass allocates hard
- registers to pseudo registers that are used only within one basic
- block. Because the basic block is linear, it can use fast and
- powerful techniques to do a very good job.
-
- The option `-dl' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.lreg' to the input
- file name.
-
- * Global register allocation (`global-alloc.c'). This pass allocates
- hard registers for the remaining pseudo registers (those whose life
- spans are not contained in one basic block).
-
- * Reloading. This pass renumbers pseudo registers with the hardware
- registers numbers they were allocated. Pseudo registers that did not
- get hard registers are replaced with stack slots. Then it finds
- instructions that are invalid because a value has failed to end up in
- a register, or has ended up in a register of the wrong kind. It fixes
- up these instructions by reloading the problematical values
- temporarily into registers. Additional instructions are generated to
- do the copying.
-
- Source files are `reload.c' and `reload1.c', plus the header
- `reload.h' used for communication between them.
-
- The option `-dg' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.greg' to the input
- file name.
-
- * Jump optimization is repeated, this time including cross-jumping and
- deletion of no-op move instructions. Machine-specific peephole
- optimizations are performed at the same time.
-
- The option `-dJ' causes a debugging dump of the RTL code after this
- pass. This dump file's name is made by appending `.jump2' to the
- input file name.
-
- * Final. This pass outputs the assembler code for the function. It is
- also responsible for identifying spurious test and compare
- instructions. The function entry and exit sequences are generated
- directly as assembler code in this pass; they never exist as RTL.
-
- The source files are `final.c' plus `insn-output.c'; the latter is
- generated automatically from the machine description by the tool
- `genoutput'. The header file `conditions.h' is used for communication
- between these files.
-
- * Debugging information output. This is run after final because it must
- output the stack slot offsets for pseudo registers that did not get
- hard registers. Source files are `dbxout.c' for DBX symbol table
- format and `symout.c' for GDB's own symbol table format.
-
- Some additional files are used by all or many passes:
-
- * Every pass uses `machmode.def', which defines the machine modes.
-
- * All the passes that work with RTL use the header files `rtl.h' and
- `rtl.def', and subroutines in file `rtl.c'. The tools `gen*' also use
- these files to read and work with the machine description RTL.
-
- * Several passes refer to the header file `insn-config.h' which contains
- a few parameters (C macro definitions) generated automatically from
- the machine description RTL by the tool `genconfig'.
-
- * Several passes use the instruction recognizer, which consists of
- `recog.c' and `recog.h', plus the files `insn-recog.c' and
- `insn-extract.c' that are generated automatically from the machine
- description by the tools `genrecog' and `genextract'.
-
- * Several passes use the header files `regs.h' which defines the
- information recorded about pseudo register usage, and `basic-block.h'
- which defines the information recorded about basic blocks.
-
- * `hard-reg-set.h' defines the type `HARD_REG_SET', a bit-vector with a
- bit for each hard register, and some macros to manipulate it. This
- type is just `int' if the machine has few enough hard registers;
- otherwise it is an array of `int' and some of the macros expand into
- loops.
-
-
- File: internals, Node: RTL, Next: Machine Desc, Prev: Passes, Up: Top
-
- RTL Representation
- ******************
-
- Most of the work of the compiler is done on an intermediate representation
- called register transfer language. In this language, the instructions to
- be output are described, pretty much one by one, in an algebraic form that
- describes what the instruction does.
-
- RTL is inspired by Lisp lists. It has both an internal form, made up of
- structures that point at other structures, and a textual form that is used
- in the machine description and in printed debugging dumps. The textual
- form uses nested parentheses to indicate the pointers in the internal form.
-
- * Menu:
-
- * RTL Objects:: Expressions vs vectors vs strings vs integers.
- * Accessors:: Macros to access expression operands or vector elts.
- * Flags:: Other flags in an RTL expression.
- * Machine Modes:: Describing the size and format of a datum.
- * Constants:: Expressions with constant values.
- * Regs and Memory:: Expressions representing register contents or memory.
- * Arithmetic:: Expressions representing arithmetic on other expressions.
- * Comparisons:: Expressions representing comparison of expressions.
- * Bit Fields:: Expressions representing bit-fields in memory or reg.
- * Conversions:: Extending, truncating, floating or fixing.
- * RTL Declarations:: Declaring volatility, constancy, etc.
- * Side Effects:: Expressions for storing in registers, etc.
- * Incdec:: Embedded side-effects for autoincrement addressing.
- * Assembler:: Representing `asm' with operands.
- * Insns:: Expression types for entire insns.
- * Calls:: RTL representation of function call insns.
- * Sharing:: Some expressions are unique; others *must* be copied.
-
-
-
- File: internals, Node: RTL Objects, Next: Accessors, Prev: RTL, Up: RTL
-
- RTL Object Types
- ================
-
- RTL uses four kinds of objects: expressions, integers, strings and vectors.
- Expressions are the most important ones. An RTL expression (``RTX'', for
- short) is a C structure, but it is usually referred to with a pointer; a
- type that is given the typedef name `rtx'.
-
- An integer is simply an `int', and a string is a `char *'. Within RTL
- code, strings appear only inside `symbol_ref' expressions, but they appear
- in other contexts in the RTL expressions that make up machine descriptions.
- Their written form uses decimal digits.
-
- A string is a sequence of characters. In core it is represented as a `char
- *' in usual C fashion, and it is written in C syntax as well. However,
- strings in RTL may never be null. If you write an empty string in a
- machine description, it is represented in core as a null pointer rather
- than as a pointer to a null character. In certain contexts, these null
- pointers instead of strings are valid.
-
- A vector contains an arbitrary, specified number of pointers to
- expressions. The number of elements in the vector is explicitly present in
- the vector. The written form of a vector consists of square brackets
- (`[...]') surrounding the elements, in sequence and with whitespace
- separating them. Vectors of length zero are not created; null pointers are
- used instead.
-
- Expressions are classified by "expression codes" (also called RTX codes).
- The expression code is a name defined in `rtl.def', which is also (in upper
- case) a C enumeration constant. The possible expression codes and their
- meanings are machine-independent. The code of an RTX can be extracted with
- the macro `GET_CODE (X)' and altered with `PUT_CODE (X, NEWCODE)'.
-
- The expression code determines how many operands the expression contains,
- and what kinds of objects they are. In RTL, unlike Lisp, you cannot tell
- by looking at an operand what kind of object it is. Instead, you must know
- from its context---from the expression code of the containing expression.
- For example, in an expression of code `subreg', the first operand is to be
- regarded as an expression and the second operand as an integer. In an
- expression of code `plus', there are two operands, both of which are to be
- regarded as expressions. In a `symbol_ref' expression, there is one
- operand, which is to be regarded as a string.
-
- Expressions are written as parentheses containing the name of the
- expression type, its flags and machine mode if any, and then the operands
- of the expression (separated by spaces).
-
- Expression code names in the `md' file are written in lower case, but when
- they appear in C code they are written in upper case. In this manual, they
- are shown as follows: `const_int'.
-
- In a few contexts a null pointer is valid where an expression is normally
- wanted. The written form of this is `(nil)'.
-
-
- File: internals, Node: Accessors, Next: Flags, Prev: RTL Objects, Up: RTL
-
- Access to Operands
- ==================
-
- For each expression type `rtl.def' specifies the number of contained
- objects and their kinds, with four possibilities: `e' for expression
- (actually a pointer to an expression), `i' for integer, `s' for string, and
- `E' for vector of expressions. The sequence of letters for an expression
- code is called its "format". Thus, the format of `subreg' is `ei'.
-
- Two other format characters are used occasionally: `u' and `0'. `u' is
- equivalent to `e' except that it is printed differently in debugging dumps,
- and `0' means a slot whose contents do not fit any normal category. `0'
- slots are not printed at all in dumps, and are often used in special ways
- by small parts of the compiler.
-
- There are macros to get the number of operands and the format of an
- expression code:
-
- `GET_RTX_LENGTH (CODE)'
- Number of operands of an RTX of code CODE.
-
- `GET_RTX_FORMAT (CODE)'
- The format of an RTX of code CODE, as a C string.
-
- Operands of expressions are accessed using the macros `XEXP', `XINT' and
- `XSTR'. Each of these macros takes two arguments: an expression-pointer
- (RTX) and an operand number (counting from zero). Thus,
-
- XEXP (X, 2)
-
- accesses operand 2 of expression X, as an expression.
-
- XINT (X, 2)
-
- accesses the same operand as an integer. `XSTR', used in the same fashion,
- would access it as a string.
-
- Any operand can be accessed as an integer, as an expression or as a string.
- You must choose the correct method of access for the kind of value
- actually stored in the operand. You would do this based on the expression
- code of the containing expression. That is also how you would know how
- many operands there are.
-
- For example, if X is a `subreg' expression, you know that it has two
- operands which can be correctly accessed as `XEXP (X, 0)' and `XINT (X,
- 1)'. If you did `XINT (X, 0)', you would get the address of the expression
- operand but cast as an integer; that might occasionally be useful, but it
- would be cleaner to write `(int) XEXP (X, 0)'. `XEXP (X, 1)' would also
- compile without error, and would return the second, integer operand cast as
- an expression pointer, which would probably result in a crash when
- accessed. Nothing stops you from writing `XEXP (X, 28)' either, but this
- will access memory past the end of the expression with unpredictable results.
-
- Access to operands which are vectors is more complicated. You can use the
- macro `XVEC' to get the vector-pointer itself, or the macros `XVECEXP' and
- `XVECLEN' to access the elements and length of a vector.
-
- `XVEC (EXP, IDX)'
- Access the vector-pointer which is operand number IDX in EXP.
-
- `XVECLEN (EXP, IDX)'
- Access the length (number of elements) in the vector which is in
- operand number IDX in EXP. This value is an `int'.
-
- `XVECEXP (EXP, IDX, ELTNUM)'
- Access element number ELTNUM in the vector which is in operand number
- IDX in EXP. This value is an RTX.
-
- It is up to you to make sure that ELTNUM is not negative and is less
- than `XVECLEN (EXP, IDX)'.
-
- All the macros defined in this section expand into lvalues and therefore
- can be used to assign the operands, lengths and vector elements as well as
- to access them.
-
-
- File: internals, Node: Flags, Next: Machine Modes, Prev: Accessors, Up: RTL
-
- Flags in an RTL Expression
- ==========================
-
- RTL expressions contain several flags (one-bit bit-fields) that are used in
- certain types of expression.
-
- `used'
- This flag is used only momentarily, at the end of RTL generation for a
- function, to count the number of times an expression appears in insns.
- Expressions that appear more than once are copied, according to the
- rules for shared structure (*Note Sharing::.).
-
- `volatil'
- This flag is used in `mem' and `reg' expressions and in insns. In RTL
- dump files, it is printed as `/v'.
-
- In a `mem' expression, it is 1 if the memory reference is volatile.
- Volatile memory references may not be deleted, reordered or combined.
-
- In a `reg' expression, it is 1 if the value is a user-level variable.
- 0 indicates an internal compiler temporary.
-
- In an insn, 1 means the insn has been deleted.
-
- `in_struct'
- This flag is used in `mem' expressions. It is 1 if the memory datum
- referred to is all or part of a structure or array; 0 if it is (or
- might be) a scalar variable. A reference through a C pointer has 0
- because the pointer might point to a scalar variable.
-
- This information allows the compiler to determine something about
- possible cases of aliasing.
-
- In an RTL dump, this flag is represented as `/s'.
-
- `unchanging'
- This flag is used in `reg' and `mem' expressions. 1 means that the
- value of the expression never changes (at least within the current
- function).
-
- In an RTL dump, this flag is represented as `/u'.
-
-
- File: internals, Node: Machine Modes, Next: Constants, Prev: Flags, Up: RTL
-
- Machine Modes
- =============
-
- A machine mode describes a size of data object and the representation used
- for it. In the C code, machine modes are represented by an enumeration
- type, `enum machine_mode', defined in `machmode.def'. Each RTL expression
- has room for a machine mode and so do certain kinds of tree expressions
- (declarations and types, to be precise).
-
- In debugging dumps and machine descriptions, the machine mode of an RTL
- expression is written after the expression code with a colon to separate
- them. The letters `mode' which appear at the end of each machine mode name
- are omitted. For example, `(reg:SI 38)' is a `reg' expression with machine
- mode `SImode'. If the mode is `VOIDmode', it is not written at all.
-
- Here is a table of machine modes.
-
- `QImode'
- ``Quarter-Integer'' mode represents a single byte treated as an integer.
-
- `HImode'
- ``Half-Integer'' mode represents a two-byte integer.
-
- `SImode'
- ``Single Integer'' mode represents a four-byte integer.
-
- `DImode'
- ``Double Integer'' mode represents an eight-byte integer.
-
- `TImode'
- ``Tetra Integer'' (?) mode represents a sixteen-byte integer.
-
- `SFmode'
- ``Single Floating'' mode represents a single-precision (four byte)
- floating point number.
-
- `DFmode'
- ``Double Floating'' mode represents a double-precision (eight byte)
- floating point number.
-
- `TFmode'
- ``Tetra Floating'' mode represents a quadruple-precision (sixteen
- byte) floating point number.
-
- `BLKmode'
- ``Block'' mode represents values that are aggregates to which none of
- the other modes apply. In RTL, only memory references can have this
- mode, and only if they appear in string-move or vector instructions.
- On machines which have no such instructions, `BLKmode' will not appear
- in RTL.
-
- `VOIDmode'
- Void mode means the absence of a mode or an unspecified mode. For
- example, RTL expressions of code `const_int' have mode `VOIDmode'
- because they can be taken to have whatever mode the context requires.
- In debugging dumps of RTL, `VOIDmode' is expressed by the absence of
- any mode.
-
- `EPmode'
- ``Entry Pointer'' mode is intended to be used for function variables
- in Pascal and other block structured languages. Such values contain
- both a function address and a static chain pointer for access to
- automatic variables of outer levels. This mode is only partially
- implemented since C does not use it.
-
- `CSImode, ...'
- ``Complex Single Integer'' mode stands for a complex number
- represented as a pair of `SImode' integers. Any of the integer and
- floating modes may have `C' prefixed to its name to obtain a complex
- number mode. For example, there are `CQImode', `CSFmode', and
- `CDFmode'. Since C does not support complex numbers, these machine
- modes are only partially implemented.
-
- `BImode'
- This is the machine mode of a bit-field in a structure. It is used
- only in the syntax tree, never in RTL, and in the syntax tree it
- appears only in declaration nodes. In C, it appears only in
- `FIELD_DECL' nodes for structure fields defined with a bit size.
-
- The machine description defines `Pmode' as a C macro which expands into the
- machine mode used for addresses. Normally this is `SImode'.
-
- The only modes which a machine description must support are `QImode',
- `SImode', `SFmode' and `DFmode'. The compiler will attempt to use `DImode'
- for two-word structures and unions, but it would not be hard to program it
- to avoid this. Likewise, you can arrange for the C type `short int' to
- avoid using `HImode'. In the long term it would be desirable to make the
- set of available machine modes machine-dependent and eliminate all
- assumptions about specific machine modes or their uses from the
- machine-independent code of the compiler.
-
- Here are some C macros that relate to machine modes:
-
- `GET_MODE (X)'
- Returns the machine mode of the RTX X.
-
- `PUT_MODE (X, NEWMODE)'
- Alters the machine mode of the RTX X to be NEWMODE.
-
- `GET_MODE_SIZE (M)'
- Returns the size in bytes of a datum of mode M.
-
- `GET_MODE_BITSIZE (M)'
- Returns the size in bits of a datum of mode M.
-
- `GET_MODE_UNIT_SIZE (M)'
- Returns the size in bits of the subunits of a datum of mode M. This
- is the same as `GET_MODE_SIZE' except in the case of complex modes and
- `EPmode'. For them, the unit size is the size of the real or
- imaginary part, or the size of the function pointer or the context
- pointer.
-
-
- File: internals, Node: Constants, Next: Regs and Memory, Prev: Machine Modes, Up: RTL
-
- Constant Expression Types
- =========================
-
- The simplest RTL expressions are those that represent constant values.
-
- `(const_int I)'
- This type of expression represents the integer value I. I is
- customarily accessed with the macro `INTVAL' as in `INTVAL (EXP)',
- which is equivalent to `XINT (EXP, 0)'.
-
- There is only one expression object for the integer value zero; it is
- the value of the variable `const0_rtx'. Likewise, the only expression
- for integer value one is found in `const1_rtx'. Any attempt to create
- an expression of code `const_int' and value zero or one will return
- `const0_rtx' or `const1_rtx' as appropriate.
-
- `(const_double:M I0 I1)'
- Represents a floating point constant value of mode M. The two
- inteGERS I0 and I1 together contain the bits of a `double' value. To
- convert them to a `double', do
-
- union { double d; int i[2];} u;
- u.i[0] = XINT (x, 0);
- u.i[1] = XINT (x, 1);
-
- and then refer to `u.d'. The value of the constant is represented as
- a double in this fashion even if the value represented is
- single-precision.
-
- The global variables `dconst0_rtx' and `fconst0_rtx' hold
- `const_double' expressions with value 0, in modes `DFmode' and
- `SFmode', respectively.
-
- `(symbol_ref SYMBOL)'
- Represents the value of an assembler label for data. SYMBOL is a
- string that describes the name of the assembler label. If it starts
- with a `*', the label is the rest of SYMBOL not including the `*'.
- Otherwise, the label is SYMBOL, prefixed with `_'.
-
- `(label_ref LABEL)'
- Represents the value of an assembler label for code. It contains one
- operand, an expression, which must be a `code_label' that appears in
- the instruction sequence to identify the place where the label should
- go.
-
- The reason for using a distinct expression type for code label
- references is so that jump optimization can distinguish them.
-
- `(const EXP)'
- Represents a constant that is the result of an assembly-time
- arithmetic computation. The operand, EXP, is an expression that
- contains only constants (`const_int', `symbol_ref' and `label_ref'
- expressions) combined with `plus' and `minus'. However, not all
- combinations are valid, since the assembler cannot do arbitrary
- arithmetic on relocatable symbols.
-
-
- File: internals, Node: Regs and Memory, Next: Arithmetic, Prev: Constants, Up: RTL
-
- Registers and Memory
- ====================
-
- Here are the RTL expression types for describing access to machine
- registers and to main memory.
-
- `(reg:M N)'
- For small values of the integer N (less than `FIRST_PSEUDO_REGISTER'),
- this stands for a reference to machine register number N: a "hard
- register". For larger values of N, it stands for a temporary value or
- "pseudo register". The compiler's strategy is to generate code
- assuming an unlimited number of such pseudo registers, and later
- convert them into hard registers or into memory references.
-
- The symbol `FIRST_PSEUDO_REGISTER' is defined by the machine
- description, since the number of hard registers on the machine is an
- invariant characteristic of the machine. Note, however, that not all
- of the machine registers must be general registers. All the machine
- registers that can be used for storage of data are given hard register
- numbers, even those that can be used only in certain instructions or
- can hold only certain types of data.
-
- Each pseudo register number used in a function's RTL code is
- represented by a unique `reg' expression.
-
- M is the machine mode of the reference. It is necessary because
- machines can generally refer to each register in more than one mode.
- For example, a register may contain a full word but there may be
- instructions to refer to it as a half word or as a single byte, as
- well as instructions to refer to it as a floating point number of
- various precisions.
-
- Even for a register that the machine can access in only one mode, the
- mode must always be specified.
-
- A hard register may be accessed in various modes throughout one
- function, but each pseudo register is given a natural mode and is
- accessed only in that mode. When it is necessary to describe an
- access to a pseudo register using a nonnatural mode, a `subreg'
- expression is used.
-
- A `reg' expression with a machine mode that specifies more than one
- word of data may actually stand for several consecutive registers. If
- in addition the register number specifies a hardware register, then it
- actually represents several consecutive hardware registers starting
- with the specified one.
-
- Such multi-word hardware register `reg' expressions may not be live
- across the boundary of a basic block. The lifetime analysis pass does
- not know how to record properly that several consecutive registers are
- actually live there, and therefore register allocation would be
- confused. The CSE pass must go out of its way to make sure the
- situation does not arise.
-
- `(subreg:M REG WORDNUM)'
- `subreg' expressions are used to refer to a register in a machine mode
- other than its natural one, or to refer to one register of a
- multi-word `reg' that actually refers to several registers.
-
- Each pseudo-register has a natural mode. If it is necessary to
- operate on it in a different mode---for example, to perform a fullword
- move instruction on a pseudo-register that contains a single byte---
- the pseudo-register must be enclosed in a `subreg'. In such a case,
- WORDNUM is zero.
-
- The other use of `subreg' is to extract the individual registers of a
- multi-register value. Machine modes such as `DImode' and `EPmode'
- indicate values longer than a word, values which usually require two
- consecutive registers. To access one of the registers, use a `subreg'
- with mode `SImode' and a WORDNUM that says which register.
-
- The compilation parameter `WORDS_BIG_ENDIAN', if defined, says that
- word number zero is the most significant part; otherwise, it is the
- least significant part.
-
- Note that it is not valid to access a `DFmode' value in `SFmode' using
- a `subreg'. On some machines the most significant part of a `DFmode'
- value does not have the same format as a single-precision floating
- value.
-
- `(cc0)'
- This refers to the machine's condition code register. It has no
- operands and may not have a machine mode. It may be validly used in
- only two contexts: as the destination of an assignment (in test and
- compare instructions) and in comparison operators comparing against
- zero (`const_int' with value zero; that is to say, `const0_rtx').
-
- There is only one expression object of code `cc0'; it is the value of
- the variable `cc0_rtx'. Any attempt to create an expression of code
- `cc0' will return `cc0_rtx'.
-
- One special thing about the condition code register is that
- instructions can set it implicitly. On many machines, nearly all
- instructions set the condition code based on the value that they
- compute or store. It is not necessary to record these actions
- explicitly in the RTL because the machine description includes a
- prescription for recognizing the instructions that do so (by means of
- the macro `NOTICE_UPDATE_CC'). Only instructions whose sole purpose
- is to set the condition code, and instructions that use the condition
- code, need mention `(cc0)'.
-
- `(pc)'
- This represents the machine's program counter. It has no operands and
- may not have a machine mode. `(pc)' may be validly used only in
- certain specific contexts in jump instructions.
-
- There is only one expression object of code `pc'; it is the value of
- the variable `pc_rtx'. Any attempt to create an expression of code
- `pc' will return `pc_rtx'.
-
- All instructions that do not jump alter the program counter implicitly
- by incrementing it, but there is no need to mention this in the RTL.
-
- `(mem:M ADDR)'
- This RTX represents a reference to main memory at an address
- represented by the expression ADDR. M specifies how large a unit of
- memory is accessed.
-
-
- File: internals, Node: Arithmetic, Next: Comparisons, Prev: Regs and Memory, Up: RTL
-
- RTL Expressions for Arithmetic
- ==============================
-
- `(plus:M X Y)'
- Represents the sum of the values represented by X and Y carried out in
- machine mode M. This is valid only if X and Y both are valid for mode
- M.
-
- `(minus:M X Y)'
- Like `plus' but represents subtraction.
-
- `(minus X Y)'
- Represents the result of subtracting Y from X for purposes of
- comparison. The absence of a machine mode in the `minus' expression
- indicates that the result is computed without overflow, as if with
- infinite precision.
-
- Of course, machines can't really subtract with infinite precision.
- However, they can pretend to do so when only the sign of the result
- will be used, which is the case when the result is stored in `(cc0)'.
- And that is the only way this kind of expression may validly be used:
- as a value to be stored in the condition codes.
-
- `(neg:M X)'
- Represents the negation (subtraction from zero) of the value
- represented by X, carried out in mode M. X must be valid for mode M.
-
- `(mult:M X Y)'
- Represents the signed product of the values represented by X and Y
- carried out in machine mode M. If X and Y are both valid for mode M,
- this is ordinary size-preserving multiplication. Alternatively, both
- X and Y may be valid for a different, narrower mode. This represents
- the kind of multiplication that generates a product wider than the
- operands. Widening multiplication and same-size multiplication are
- completely distinct and supported by different machine instructions;
- machines may support one but not the other.
-
- `mult' may be used for floating point division as well. Then M is a
- floating point machine mode.
-
- `(umult:M X Y)'
- Like `mult' but represents unsigned multiplication. It may be used in
- both same-size and widening forms, like `mult'. `umult' is used only
- for fixed-point multiplication.
-
- `(div:M X Y)'
- Represents the quotient in signed division of X by Y, carried out in
- machine mode M. If M is a floating-point mode, it represents the
- exact quotient; otherwise, the integerized quotient. If X and Y are
- both valid for mode M, this is ordinary size-preserving division.
- Some machines have division instructions in which the operands and
- quotient widths are not all the same; such instructions are
- represented by `div' expressions in which the machine modes are not
- all the same.
-
- `(udiv:M X Y)'
- Like `div' but represents unsigned division.
-
- `(mod:M X Y)'
- `(umod:M X Y)'
- Like `div' and `udiv' but represent the remainder instead of the
- quotient.
-
- `(not:M X)'
- Represents the bitwise complement of the value represented by X,
- carried out in mode M, which must be a fixed-point machine mode. X
- must be valid for mode M, which must be a fixed-point mode.
-
- `(and:M X Y)'
- Represents the bitwise logical-and of the values represented by X and
- Y, carried out in machine mode M. This is valid only if X and Y both
- are valid for mode M, which must be a fixed-point mode.
-
- `(ior:M X Y)'
- Represents the bitwise inclusive-or of the values represented by X and
- Y, carried out in machine mode M. This is valid only if X and Y both
- are valid for mode M, which must be a fixed-point mode.
-
- `(xor:M X Y)'
- Represents the bitwise exclusive-or of the values represented by X and
- Y, carried out in machine mode M. This is valid only if X and Y both
- are valid for mode M, which must be a fixed-point mode.
-
- `(lshift:M X C)'
- Represents the result of logically shifting X left by C places. X
- must be valid for the mode M, a fixed-point machine mode. C must be
- valid for a fixed-point mode; which mode is determined by the mode
- called for in the machine description entry for the left-shift
- instruction. For example, on the Vax, the mode of C is `QImode'
- regardless of M.
-
- On some machines, negative values of C may be meaningful; this is why
- logical left shift and arithmetic left shift are distinguished. For
- example, Vaxes have no right-shift instructions, and right shifts are
- represented as left-shift instructions whose counts happen to be
- negative constants or else computed (in a previous instruction) by
- negation.
-
- `(ashift:M X C)'
- Like `lshift' but for arithmetic left shift.
-
- `(lshiftrt:M X C)'
- `(ashiftrt:M X C)'
- Like `lshift' and `ashift' but for right shift.
-
- `(rotate:M X C)'
- `(rotatert:M X C)'
- Similar but represent left and right rotate.
-
- `(abs:M X)'
- Represents the absolute value of X, computed in mode M. X must be
- valid for M.
-
- `(sqrt:M X)'
- Represents the square root of X, computed in mode M. X must be valid
- for M. Most often M will be a floating point mode.
-
- `(ffs:M X)'
- Represents the one plus the index of the least significant 1-bit in X,
- represented as an integer of mode M. (The value is zero if X is
- zero.) The mode of X need not be M; depending on the target machine,
- various mode combinations may be valid.
-
-
- File: internals, Node: Comparisons, Next: Bit Fields, Prev: Arithmetic, Up: RTL
-
- Comparison Operations
- =====================
-
- Comparison operators test a relation on two operands and are considered to
- represent the value 1 if the relation holds, or zero if it does not. The
- mode of the comparison is determined by the operands; they must both be
- valid for a common machine mode. A comparison with both operands constant
- would be invalid as the machine mode could not be deduced from it, but such
- a comparison should never exist in RTL due to constant folding.
-
- Inequality comparisons come in two flavors, signed and unsigned. Thus,
- there are distinct expression codes `gt' and `gtu' for signed and unsigned
- greater-than. These can produce different results for the same pair of
- integer values: for example, 1 is signed greater-than -1 but not unsigned
- greater-than, because -1 when regarded as unsigned is actually `0xffffffff'
- which is greater than 1.
-
- The signed comparisons are also used for floating point values. Floating
- point comparisons are distinguished by the machine modes of the operands.
-
- The comparison operators may be used to compare the condition codes `(cc0)'
- against zero, as in `(eq (cc0) (const_int 0))'. Such a construct actually
- refers to the result of the preceding instruction in which the condition
- codes were set. The above example stands for 1 if the condition codes were
- set to say ``zero'' or ``equal'', 0 otherwise. Although the same
- comparison operators are used for this as may be used in other contexts on
- actual data, no confusion can result since the machine description would
- never allow both kinds of uses in the same context.
-
- `(eq X Y)'
- 1 if the values represented by X and Y are equal, otherwise 0.
-
- `(ne X Y)'
- 1 if the values represented by X and Y are not equal, otherwise 0.
-
- `(gt X Y)'
- 1 if the X is greater than Y. If they are fixed-point, the comparison
- is done in a signed sense.
-
- `(gtu X Y)'
- Like `gt' but does unsigned comparison, on fixed-point numbers only.
-
- `(lt X Y)'
- `(ltu X Y)'
- Like `gt' and `gtu' but test for ``less than''.
-
- `(ge X Y)'
- `(geu X Y)'
- Like `gt' and `gtu' but test for ``greater than or equal''.
-
- `(le X Y)'
- `(leu X Y)'
- Like `gt' and `gtu' but test for ``less than or equal''.
-
- `(if_then_else COND THEN ELSE)'
- This is not a comparison operation but is listed here because it is
- always used in conjunction with a comparison operation. To be
- precISE, COND is a comparison expression. This expression represents
- a choice, according to COND, between the value represented by THEN and
- the one represented by ELSE.
-
- On most machines, `if_then_else' expressions are valid only to express
- conditional jumps.
-
-
- File: internals, Node: Bit Fields, Next: Conversions, Prev: Comparisons, Up: RTL
-
- Bit-fields
- ==========
-
- Special expression codes exist to represent bit-field instructions. These
- types of expressions are lvalues in RTL; they may appear on the left side
- of a assignment, indicating insertion of a value into the specified bit
- field.
-
- `(sign_extract:SI LOC SIZE POS)'
- This represents a reference to a sign-extended bit-field contained or
- starting in LOC (a memory or register reference). The bit field is
- SIZE bits wide and starts at bit POS. The compilation option
- `BITS_BIG_ENDIAN' says which end of the memory unit POS counts from.
-
- Which machine modes are valid for LOC depends on the machine, but
- typically LOC should be a single byte when in memory or a full word in
- a register.
-
- `(zero_extract:SI LOC SIZE POS)'
- Like `sign_extract' but refers to an unsigned or zero-extended bit
- field. The same sequence of bits are extracted, but they are filled
- to an entire word with zeros instead of by sign-extension.
-
-
- File: internals, Node: Conversions, Next: RTL Declarations, Prev: Bit Fields, Up: RTL
-
- Conversions
- ===========
-
- All conversions between machine modes must be represented by explicit
- conversion operations. For example, an expression which is the sum of a
- byte and a full word cannot be written as `(plus:SI (reg:QI 34) (reg:SI
- 80))' because the `plus' operation requires two operands of the same
- machine mode. Therefore, the byte-sized operand is enclosed in a
- conversion operation, as in
-
- (plus:SI (sign_extend:SI (reg:QI 34)) (reg:SI 80))
-
- The conversion operation is not a mere placeholder, because there may be
- more than one way of converting from a given starting mode to the desired
- final mode. The conversion operation code says how to do it.
-
- `(sign_extend:M X)'
- Represents the result of sign-extending the value X to machine mode M.
- M must be a fixed-point mode and X a fixed-point value of a mode
- narrower than M.
-
- `(zero_extend:M X)'
- Represents the result of zero-extending the value X to machine mode M.
- M must be a fixed-point mode and X a fixed-point value of a mode
- narrower than M.
-
- `(float_extend:M X)'
- Represents the result of extending the value X to machine mode M. M
- must be a floating point mode and X a floating point value of a mode
- narrower than M.
-
- `(truncate:M X)'
- Represents the result of truncating the value X to machine mode M. M
- must be a fixed-point mode and X a fixed-point value of a mode wider
- than M.
-
- `(float_truncate:M X)'
- Represents the result of truncating the value X to machine mode M. M
- must be a floating point mode and X a floating point value of a mode
- wider than M.
-
- `(float:M X)'
- Represents the result of converting fixed point value X, regarded as
- signed, to floating point mode M.
-
- `(unsigned_float:M X)'
- Represents the result of converting fixed point value X, regarded as
- unsigned, to floating point mode M.
-
- `(fix:M X)'
- When M is a fixed point mode, represents the result of converting
- floating point value X to mode M, regarded as signed. How rounding is
- done is not specified, so this operation may be used validly in
- compiling C code only for integer-valued operands.
-
- `(unsigned_fix:M X)'
- Represents the result of converting floating point value X to fixed
- point mode M, regarded as unsigned. How rounding is done is not
- specified.
-
- `(fix:M X)'
- When M is a floating point mode, represents the result of converting
- floating point value X (valid for mode M) to an integer, still
- represented in floating point mode M, by rounding towards zero.
-
-
- File: internals, Node: RTL Declarations, Next: Side Effects, Prev: Conversions, Up: RTL
-
- Declarations
- ============
-
- Declaration expression codes do not represent arithmetic operations but
- rather state assertions about their operands.
-
- `(strict_low_part (subreg:M (reg:N R) 0))'
- This expression code is used in only one context: operand 0 of a `set'
- expression. In addition, the operand of this expression must be a
- `subreg' expression.
-
- The presence of `strict_low_part' says that the part of the register
- which is meaningful in mode N, but is not part of mode M, is not to be
- altered. Normally, an assignment to such a subreg is allowed to have
- undefined effects on the rest of the register when M is less than a
- word.
-
-
-